Analysis and Determination of Inner Lip texture Descriptors for Visual Speech Representation

نویسندگان

  • Xibin Jia
  • Hua Du
  • Yanfang Han
  • David M. W. Powers
چکیده

The problem of visual speech representation for bimodal based speech recognition includes particular challenges in the modeling of the inner lip texture reflecting different pronunciations, such as the appearance of teeth and tongue. This paper proposes and analyzes several possible statistical inner lip texture descriptors to determine an effective and discriminant feature. Simply using grayscale without full specification of the underlying colour model tends to loss some significant discriminative information. Therefore thorough exploration on the color space components selection in computing the local inner lip texture is thus a primary goal of the present research. The L channel of Lab color space is finally determined as the basis for the development of the inner lip texture model. Through feature level fusion, the final classification of visual speech is performed based on the proposed inner lip texture descriptor and standard geometric features. Together with audio speech, this paper furthers the development of the CHMM based bimodal Chinese character pronunciation recognition system. The experimental results show that the local inner texture descriptors, such as the color moment with geometric feature, outperform the holistic inner texture descriptors, such as the statistical histogram, in representing visual speech with the close discriminability but low dimensionality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visual analysis of viseme dynamics

Face to face dialogue is the most natural mode of communication between humans. The combination of human visual perception of expression and perception in changes in intonation provides semantic information that communicates idea, feelings and concepts. The realistic modelling of speech movements, through automatic facial animation, and maintaining audio-visual coherence is still a challenge in...

متن کامل

A Kind of Visual Speech Feature with the Geometric and Local Inner Texture Description

In this paper, we propose a type of joint feature with geometric parameters and color moments to represent the speaking-mouth frames for image-based visual speech synthesis systems. Based on FDP around the mouth area, the geometric feature is obtained by computing Euclidean distances to describe the width of the speaking mouth, the height of the outer and inner lips and the distances between th...

متن کامل

Kernel-based Speaker Verification Using Spatiotemporal Lip Information

The lip-region can be interpreted as either a genetic or behavioural biometric trait. Despite this breadth of biometric content, lip-based biometric systems are scarcely developed in the literature. A recent trend in lip biometrics is to use a spatiotemporal texture representation of visual speech to generate biometric features. In this paper we make two contributions related to the above biome...

متن کامل

Lip-reading from parametric lip contours for audio- visual speech recognition

This paper describes the incorporation of a visual lip tracking and lip-reading algorithm that utilizes the affine-invariant Fourier descriptors from parametric lip contours to improve the audio-visual speech recognition systems. The audio-visual speech recognition system presented here uses parallel hidden Markov models (HMMs), where a joint decision, using an optimal decision rule, is made af...

متن کامل

Multimodal speaker/speech recognition using lip motion, lip texture and audio

We present a new multimodal speaker/speech recognition system that integrates audio, lip texture and lip motion modalities. Fusion of audio and face texture modalities has been investigated in the literature before. The emphasis of this work is to investigate the benefits of inclusion of lip motion modality for two distinct cases: speaker and speech recognition. The audio modality is represente...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JCP

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014